CLAIMS 



We claim: 



1 1. An apparatus comprising: 

2 a data storage unit, of a cache, to speculatively 

3 provide data responsive to an access for an 

4 instruction; 

5 circuitry coupled to said data storage unit to 

6 perform one or more arithmetic logic unit (ALU) 

7 functions specified by said instruction on the 

8 speculatively provided data; 

9 hit/miss logic, of the cache, to determine if said 

10 access was a hit or a miss; and 

11 a replay mechanism coupled to the hit/miss logic to 

12 replay said instruction if the access is 

13 determined to be a miss. 



1 2. The apparatus of claim 1, wherein the replay 

2 mechanism includes a delay unit to delay a copy of the 

3 instruction for approximately a length of time required 

4 for the hit/miss logic to complete the determination. 



1 3. The apparatus of claim 2, wherein the length of time 

2 includes time additionally required for the circuitry to 

3 complete the one or more arithmetic logic unit (ALU) 

4 functions. 

1 4. The apparatus of claim 1, wherein the data storage 

2 unit and the circuitry is clocked at a faster frequency 

3 than the hit/miss logic. 
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1 5. The apparatus of claim 1, wherein said apparatus 

2 includes: 

3 an execution subcore clocked at a faster frequency 

4 that said hit /miss logic, said execution subcore 

5 including said data storage unit and said 

6 circuitry. 

1 6. A method comprising: 

2 fetching an instruction; 

3 accessing data from a cache location having stored 

4 therein data represented as possibly being 

5 associated with an address referenced by the 

6 instruction; 

7 performing one or more arithmetic logic unit (ALU) 

8 functions specified by the instruction on the 

9 data; 

10 determining that the data is not associated with the 

11 address referenced by the instruction, wherein 

12 the accessing is performed at a higher clock 

13 rate than the determining; and 

14 replaying, responsive to the determining, the 

15 instruction and any other instructions that 

16 received results of that instruction. 

1 7. The method of claim 6, wherein said performing is 

2 performed at a higher clock rate than the determining. 

1 8. The method of claim 6, wherein said performing is 

2 performed at the higher clock rate. 

1 9. The method of claim 6, wherein the replaying includes 

2 replaying a copy of the instruction that has been delayed 

3 for approximately a length of time required to complete 

4 the performing and the determining. 
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1 10. A system comprising: 

2 a main memory including a plurality of addresses; and 

3 a processor coupled to the main memory, including, 

4 a cache memory having: 

5 a data storage unit to store data 

6 represented as possibly being 

7 associated with a first address of the 

8 plurality of addresses, and 

9 hit/miss logic clocked at a slower 

10 frequency than the data storage unit 

11 to determine whether the data is 

12 associated with the first address, 

13 circuitry coupled to said data storage unit to 

14 perform arithmetic logic unit (ALU) 

15 functions, and 

16 a replay mechanism coupled to the hit/miss logic 

17 to replay a first instruction referencing 

18 the first address in response to the 

19 hit/miss logic determining that the data 

20 was not associated with the first address. 

1 11. The system of claim 10, wherein the replay mechanism 

2 also replays second instructions that received results of 

3 the first instruction in response to the hit/miss logic 

4 determining after the first instruction has been executed 

5 that- the data was not associated with the first address. 

1 12. The system of claim 10, wherein the replay mechanism 

2 includes a delay unit to delay a copy of the first 

3 instruction for approximately a length of time required to 

4 execute the first instruction and determine whether the 

5 data is associated with the first address. 
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1 13. The system of claim 10, wherein the hit/miss logic is 

2 clocked at a slower frequency than the circuitry. 

1 14 . A method comprising: 

2 fetching an instruction that references a first 

3 address of a plurality of addresses included in 

4 a main memory; 

5 accessing data from a cache's data storage unit, 

6 wherein the data is represented as being 

7 associated with one of said plurality of 

8 addresses; 

9 performing one or more arithmetic logic unit (ALU) 

10 operations specified by the instruction on the 

11 data; 

12 determining that the data is not associated with the 

13 first address, wherein the accessing is 

14 performed at a higher clock rate than the 

15 determining; and 

16 replaying, responsive to the determining, the 

17 instruction and any other instructions that 

18 received results of that instruction. 

1 15. The method of claim 14, wherein the replaying 

2 includes replaying a copy of the instruction that has been 

3 delayed for approximately a length of time required to 

4 complete the performing and the determining. 



1 16. The method of claim 14, wherein the performing is 

2 performed at a higher clock rate than the determining. 

1 17. A processor comprising: 

2 a cache including, 
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3 a data storage unit having a plurality of cache 

4 locations, and 

5 hit/miss logic; 

6 arithmetic logic unit (ALU) circuitry to operate on 

7 data provided by said data storage unit prior to 

8 verification of cache hit/misses for said data 

9 by said hit/miss logic; and 

10 a replay mechanism coupled to said hit/miss logic to 

11 replay instructions executed on data accessed 

12 from said data storage unit when such accesses 

13 are determined to be cache misses by said 

14 hit/miss logic. 

1 18. The processor of claim 17, wherein the hit/miss logic 

2 is clocked at a slower frequency than said data storage 

3 unit. 

1 19. The processor of claim 18, wherein a frequency at 

2 which the data storage unit is clocked is an integer 

3 multiple of the slower frequency. 

1 20. The processor of claim 18, wherein the hit/miss logic 

2 is clocked at a slower frequency than said arithmetic 

3 logic unit (ALU) circuitry. 

1 21. A method comprising: 

2 fetching an instruction specifying an operation on 

3 data stored at an address; 

4 accessing contents of a cache location based on said 

5 address; 

6 executing said instruction based on said accessed 

7 contents; 
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8 determining whether said accessing resulted in a 

9 cache hit, wherein the accessing is performed at 

10 a faster clock frequency than the determining; 

11 and 

12 if said accessing did not result in a cache hit, 

13 repeating said accessing, executing, and 

14 determining. 
15 

1 22. The method of claim 21, wherein the faster clock 

2 frequency is an integer multiple of a frequency at which 

3 the determining is performed. 

1 23. The method of claim 21, wherein the executing is 

2 performed at a faster clock frequency than the 

3 determining. 

1 24. The method of claim 21, wherein said executing is 

2 performed at the faster clock frequency. 

1 25. A method comprising: 

2 accessing data from a cache location having stored 

3 therein data represented as possibly being 

4 associated with an address referenced by an 

5 instruction; 

6 performing one or more arithmetic logic unit (ALU) 

7 functions specified by the instruction on the 

8 data; 

9 determining that the data is not associated with the 

10 address referenced by the instruction; and 

11 replaying, responsive to the determining, the 

12 instruction and any other instructions that 

13 received results of that instruction. 
1 
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1 26. The method of claim 25, wherein the replaying 

2 includes replaying a copy of the instruction that has been 

3 delayed for approximately a length of time required to 

4 complete the performing and the determining. 

1 27. The method of claim 25, wherein the accessing and the 

2 performing are performed at a higher clock rate than the 

3 determining. 
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